Using as a REST service


NLP Architect server is a hug REST server that is able to run predictions on different models configured using NLP Architect. The server includes a web front-end exposing the model’s annotations.

Currently we provide 4 services:

The server has two main components:

  • Service which is a representation of each model’s API.

  • nlp_architect.server.serve module which is a hug application which handles processing of HTTP requests and initiating calls to the desired model.

The server supports extending with new services using provided API classes, see Annotation Structure Types - Server Responses for more details.

Running NLP Architect Server

Starting the server

To run the server run the following command:

nlp_architect server -p 8080

The server will run locally on port 8080 and can be queried on /inference directive.

To access the visualization - http://localhost:8080


The following headers are required when communicating with the server:

  • Content-Type: “application/json” or “application/gzip”

  • Response-Format: The response format, “json” or “gzip”. The default response format is json.

The request content has the following format:

    "model_name": "ner" | "bist" | "intent_extraction" | "machine_comprehension",
        {"id": 1,
        "doc": "Time flies like an arrow. fruit flies like a banana."},
        {"id": 2,
        "doc": "the horse passed the barn fell"},
        {"id": 3,
        "doc": "the old man the boat"}

In the example above, model_name is the desired model to run the documents through and each input document is marked with an id and content.


The server supports 3 types of Responses (see Annotation Structure Types - Server Responses bellow).


Request annotations using the NER model:

  • BIST parser - Core NLP models annotation structure

Annotation Structure Types - Server Responses

The server supports 2 types of annotation structure (responses from the server):

A annotation of a Core NLP model (Part-of-speech (POS), lemma, dependency relations etc.), usually a word-to-label annotation.

An annotation of a more high-level model (Intent Extraction, NER, Chunking, etc.). usually a span-to-label annotation used for higher level of nlp tasks and applications.

Core NLP models annotation structure

CoreNLPDoc class is hosting the Core NLP models annotation structure. (can be imported using: from nlp_architect.common.core_nlp_doc import CoreNLPDoc).

  "doc_text": "<the_document_text>",
  "sentences": list of sentences, each word in a sentence is represented in \
    a dict (list(list(dict))). the dict is structured as follows:
              "start": <start_index> (int),
              "len": <word_length> (int),
              "pos": <POS_label> (string),
              "ner": <NER_label> (string),
              "lemma": <Lemma_string> (string),
              "gov": <GOV_index> (int),
              "rel": <Dependency_Relation_label> (string)

High-level models annotation structure

HighLevelDoc class is hosting the High-level models annotation structure. (can be imported using: from nlp_architect.common.high_level_doc import HighLevelDoc).

    "doc_text" : "<the_document_text>",
    "annotation_set" : list of all annotations in document (list(string)),
    "spans" : list of span dict (list(dict)), each span_dict is structured as follows:
              "end": <end_index> (int),
              "start": <start_index> (int),
              "type": <annotation_string> (string)

Machine Comprehension structure

Only for the MachineComprehensionApi response. .. code:: json


“id”: “<id>”, “answer”: “<answer_text>”


Adding new services

Adding a new service to the server

All the services are declared in a JSON file found at nlp_architect/server/services.json.

In order to add a new service to the server you need to go over 3 steps:

  1. Detect the type of your service suitable for your model, either Core NLP model or High-level model.

  2. Create an API class for your service in nlp_architect/api/ folder. Make your class inherit from AbstractApi and implement all relevant methods. Notice that your inference class_method must return either CoreNLPDoc or HighLevelDoc.

  3. Add the definition of the new service to services.json as follows:

"<service_name>" : {"file_name": "<api_file_name>", "type": <"core"/"high_level>"}

NLP Architect Server Deployment Tutorial


This tutorial walks you through the multiple steps for deploying NLP Architect server locally. Deployment allows the server to scale well based on user requests.

SW Stack

Various layers of the software stack are as follows



  1. Ubuntu 16.04

  2. Must have root privileges

  3. Virtualization must be enabled in your computer BIOS

Kubectl Installation

kubectl is a command line interface for running commands against Kubernetes clusters. Following are the installation instructions

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s | sudo apt-key add -
sudo touch /etc/apt/sources.list.d/kubernetes.list
echo "deb kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Minikube Installation

Minikube provides a simple way of running Kubernetes on your local machine. Following are the installation instructions

curl -Lo minikube && chmod +x minikube && sudo mv minikube /usr/local/bin/
sudo minikube start --vm-driver=none
Follow instructions posted by minikube

Docker Installation

sudo apt-get install ca-certificates curl gnupg2 software-properties-common
sudo add-apt-repository \
"deb [arch=amd64] \
$(lsb_release -cs) stable"
sudo apt-key adv --keyserver --recv-keys 7EA0A9C3F273FCD8
sudo apt-get update && sudo apt-get install docker-ce
sudo usermod -aG docker $USER
# if the following command doesn't work, re-login again
exec su -l $USER

Launch Docker Registry

Follow to fix proxy things if need be
docker run -d -p 5000:5000 --restart=always --name registry registry:2

Build DockerFile

Create a Dockerfile with the following content and save it in your deployment directory.

FROM python:3.6 AS builder

RUN apt-get update
RUN apt-get install -y git

# check out project at current location, hopefully this is a tag eventually
# right now this is latest commit from at 12:21pm 8/6/18
RUN git clone https://x-access-token:"${GITHUB_ACCESS_TOKEN}"

# prevent keeping token in final image
FROM python:3.6

COPY --from=builder /nlp-architect /src/nlp-architect


WORKDIR /src/nlp-architect
RUN git fetch
RUN git checkout ${NLP_ARCH_VERSION}

# install nlp-architect project itself
RUN pip3 install .

# run NLP Architect server
CMD [ "nlp_architect", "server", "-p", "8080"]

Run the following commands to build the docker file

docker build --build-arg GITHUB_ACCESS_TOKEN=${GITHUB_ACCESS_TOKEN} --build-arg HTTP_PROXY=${HTTP_PROXY} --build-arg HTTPS_PROXY=${HTTPS_PROXY} --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${https_proxy} -t nlp_architect .
docker tag nlp_architect localhost:5000/nlp_architect
docker push localhost:5000/nlp_architect
docker run --rm -it -p 8080:8080 localhost:5000/nlp_architect

Deploy Kubernetes

Create a depolyment.yaml file in the same directory as your deployment. Fill the deployment.yaml file with the following contents

apiVersion: extensions/v1beta1
kind: Deployment
  name: nlp-server
  replicas: 1
        run: nlp-server
        id: "0"
        app: nlp-server
      - name: nlp-server
        image: localhost:5000/nlp_architect
        imagePullPolicy: Always
            cpu: 1300m
            memory: 1600Mi
            cpu: 1100m
            memory: 1300Mi
        - containerPort: 8080


apiVersion: v1
kind: Service
  name: nlp-server
  type: NodePort
    app: nlp-server
  - name: http
    port: 8080
    targetPort: 8080


apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
  name: nlp-server
    apiVersion: apps/v1
    kind: Deployment
    name: nlp-server
  minReplicas: 3
  maxReplicas: 10
  - type: Resource
      name: cpu
      targetAverageUtilization: 50

Run the following commands to create a deployment on the kubernetes cluster

kubectl create -f deployment.yaml
# run the following command to see your pods spin up; there will be 3 of them if your machine has enough resources
watch -n1 kubectl get pods
# this next command gives you the {nodeportvalue} below, it'll be in the format `8080:{nodeportvalue}`
kubectl get svc
# this next command will show you the hpa created with this deployment
kubectl get hpa
# if you ever want to see everything at once, run this:
kubectl get all
# if there is a problem, run this:
kubectl logs {podname}
# if there is a problem with the deployment itself, run this:
kubectl describe pod {podname}
# to redeploy, run this, and then rerun the `kubectl create -f deployment.yaml` command
kubectl delete -f deployment.yaml

To test the server

curl --noproxy "*" $(sudo minikube ip):{nodeportvalue}
Where nodeportvalue is from kubectl get svc

Now you can browse nlp architect at the following url: http://{operating_system_ip}:8080